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Abstract. The minimization of Fisher's information (MFI) approach of Frieden et al. [Phys. Rev. E 60 48 
(1999)] is applied to the study of size distributions in social groups on the basis of a recently established 
analogy between scale invariant systems and classical gases arXiv:0908.0504 . Going beyond the ideal gas 
scenario is seen to be tantamount to simulating the interactions taking place in a network's competitive 
cluster growth process. We find a scaling rule that allows to classify the final cluster-size distributions using 
only one parameter that we call the competitiveness. Empirical city-size distributions and electoral results 
can be thus reproduced and classified according to this competitiveness, which also allows to correctly 
predict well-established assessments such as the "six-degrees of separation", which is shown here to be a 
direct consequence of the maximum number of stable social relationships that one person can maintain, 
known as Dunbar's number. Finally, we show that scaled city-size distributions of large countries follow 
the same universal distribution. 

PACS. 89.70.Cf Entropy and other measures of information - 05.90. +m Other topics in statistical physics, 
thermodynamics, and nonlinear dynamical systems - 89. 75. Da Systems obeying scaling laws - 89.75.-k 
Complex systems 



1 Introduction 

Regularities reflected in either scaling properties jl] or 
power laws [21314] appear in different scenarios related to 
social groups. One of the most intriguing is Zipf's law [51, a 
power law with exponent —2 for the density distribution 
function that is observed in describing urban agglomer- 
ations [5] and firm sizes all over the world 7J. This fact 
has received a remarkable degree of attention in the litera- 
ture. The above mentioned regularities have been detected 
in other contexts as well, ranging from percolation theory 
and nuclear multi-fragmentation [5] to the abundances of 
genes in various organisms and tissues the frequency 
of words in natural languages [SllOj . the scientific collab- 
oration networks [IT] , the total number of cites of physics 
journals [H], the Internet traffic [T^ or the Linux packages 
finks [14]. More recently, R. N. Costa Filho et al. 15] found 
another special regularity in the density distribution func- 
tion of the number of votes in the Brazilian elections, a 
power law with exponent — 1 . This law has also been found 
in Ref. fl6 , using an information-theoretic methodology 
[T7] . for both the city-size distribution of the province of 
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Huelva (Spain) and the results of the 2008 Spanish Gen- 
eral Elections. These findings allow one to conjecture that 
this behavior refiects a second class of universality. 
What all these disparate systems have in common is the 
lack of a characteristic size, length or frequency for the ob- 
servable under scrutiny, which makes them scale- invariant. 
In Ref. [16] we have introduced an information-theoretic 
technique based upon the minimization of Fisher's infor- 
mation measure [17] (abbreviated as MFI) that allows for 
the formulation of a "thermodynamics" for scale-invariant 
systems. The methodology establishes an analogy between 
such systems and physical gases which, in turn, shows 
that the two special power laws mentioned in the pre- 
ceding paragraph lead to a set of relationships formally 
identical to those pertaining to the equilibrium states of a 
scale invariant non-interacting system, the scale-free ideal 
gas (SFIG). The difference between the two distributions 
is thereby attributed to different boundary conditions on 
the SFIG. 

However, there are many social systems that can not be 
included into any of these two universality classes and ex- 
hibit different kinds of behavior fT6\ . In order to deal with 
them, during the last years researchers have worked out 
different mathematical models and thus addressed urban 
dynamics [T8| and electoral results [15|19j . developing de- 
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tailed realistic approaches. Ref. [20] is highly recommend- 
able as a primer on urban modelling. However, some as- 
pects of the concomitant problems defy full understand- 
ing, since a clear prescription for the classification of the 
size-distribution of social groups is still missing. To rem- 
edy such an understanding-gap is our main purpose here. 
The goals and motivation of this work are thus focussed 
on gaining insight into such size-distributions in the case 
of systems that can not be described by recourse to the 
two power laws described above, i.e., by a non-interacting 
scenario. If an analogy with real gases is worked out when 
interactions are duly taken into account, a microscopic 
description is needed in order to obtain the pair correla- 
tion function |21| . This is achieved using numerical simu- 
lations as in molecular dynamics. An similar path will be 
followed here by recourse to the Fisher-derived analogy 
of [16]. Thereby we go beyond the SFIG stage by using 
a proportional growth process (PGP) so as to model the 
interaction between the elements of the social network sys- 
tem. We can thus study the PGP effect on the density dis- 
tribution. This requires to have at hand a way to properly 
describe scale invariance at the microscopic level [16122] 
via a competitive cluster growth process within a com- 
plex network. 

This work is organized as follows: in Sec. II we describe 
the application of the MFI approach 17] to complex net- 
works in order to obtain the degree distribution and thus 
describe the competitive cluster growth process (inside the 
network). This allows one to, in turn, microscopically sim- 
ulate growth processes in a social group. In Sec. Ill we 
study the size distributions obtained using this method- 
ology. We find a scale transformation that allows for sys- 
tematically classifying the deviations from the SFIG that 
we encounter in the cluster-size distributions. This clas- 
sification is effected using just a single parameter, which 
we call the competitiveness. We also apply this criterion to 
classify the city-size distributions of the provinces of Spain 
and some electoral results. Moreover, we show that empir- 
ical assessments as the average path length and Dunbar's 
number are well reproduced by our approach. Using such 
a scale transformation we demonstrate that most distri- 
butions of city population in large countries exhibit the 
same shape. Finally, in Sec. IV we draw some conclusions. 



2.1 The scale free ideal network 

The basic elements of networks are "nodes" that are con- 
nected to other nodes by "edges" . The degree k of each 
node is defined as the number of connections it possesses. 
The degree distribution (DD) F{k) and the way the nodes 
are connected define the statistical properties of the net- 
work. Scale-free complex networks display many interest- 
ing properties that have been found in techno-sociological 
systems such as the Internet (World Wide Web [23] , e-mail 
networks [24. and also instant-message-sending networks 
[2S], for example). 

In our model, we assume that the network can be de- 
scribed at the macroscopic level as a scale invariant system 
of N nodes, with the number of connections k the "coordi- 
nate" that locates each node in the pertinent configuration 
space. We also assume that the degree of each node does 
not depend on the degree of other nodes. In these circum- 
stances we can i) legitimately describe the network as a 
SFIG in an equilibrium state, a scenario to be denoted as 
the scale free ideal network (SPIN), and ii) derive the DD 
via the MFI, which we pass now to recapitulate. 



2.1.1 Minimum Fisher Information approach (MFI) 

The Fisher information measure / for a system of N el- 
ements, described by the coordinate k and the physical 
parameters 9 has the form ^6] 



I{F) = ck / dkF{k\d) 



d\nF{k\9) 



de 



(1) 



where F{k\9) is the density distribution in configuration 
space and the constant Ck accounts for proper dimension- 
ality. According to MFI tenets [IT], the equilibrium state 
of the system minimizes / subject to prior conditions, such 
as the normalization of F, namely (1) = 1. The MFI is 
then cast as a variation problem of the form [17] 



<5{/(F)-/i(l)}^0, 



(2) 



where /i is the normalization-associated Lagrange multi- 
plier. 



2 Theoretical method 



City-size distributions and electoral results display a sim- 
ilar scale-free behavior, and both of them have the same 
constituents: groups of people. Although the resources of 
these groups or the interests of the individuals composing 
them may be different in each case, a naive approach is to 
assume that people are connected to other people, hence 
giving rise to a network where groups of interest develop. 
Network theory [1^ has been successfully used before for 
dealing with electoral results and the spread of opinions, 
which encourages to employ it to develop the microscopic 
description of the associated systems. 



2.1.2 Application of the MFI to a scale-free network 

In the case of the derivation of the DD of our complex net- 
work, we define a minimum degree of unity and a max- 
imum degree value of kM- With the change of variable 
u = In/c, the scale transformation k' — k/9k transforms 
u into u' = u — &k, where &k = hi 9k. The distribu- 
tion of physical elements is then described by the mono- 
parametric translation families F(fc|0) = f{u\0k) = f{u'). 
Taking into account the fact that the Jacobian of the 
transformation is dk = e^du, the information measure I 
can be obtained in the continuous limit as 



I = Cu 



In A; A 



du e''f{u) 



d\nf{i 



du 



(3) 
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and, with the normahzation constraint 



In ku 



du e^'fiu) = 1, 



(4) 



the variation problem reads now 



(•In kM 

S ■( c,i I du e"/ 



dlnf 



du 



In /c A 



du e"/ } = 0. 



(5) 

Introducing f(u) = e "if'^(u), and varying with respect 
to ^ leads to the Schrodinger-like equation ^/Tj 



92 



If (u) = 0, 



(6) 



where /i' ~ ^J./cu- The general solution to this equation 
is •Z'(u) = e^""/2 with a = \J\ + /i'. Equilibrium corre- 
sponds to the ground state solution a = [17], which 
yields the same density distribution as that of the SFIG 
in the thermodynamic limit 



F{k)dk ^ 



1 dk 
In kM k 



(7) 



Once we know, for a given total number of nodes iV, the 
associated DD of the SPIN, we proceed by assigning a 
number of potential edges k to each node, with k ran- 
domly obtained from F{k). Accordingly, the nodes are 
randomly connected among themselves by their assigned 
edges, with two restrictions: a node cannot be connected 
to itself nor twice to the same node. The ensuing pro- 
cess ends when no more connections can be established. 
We will show later that the values of A^ and k^i can be 
arbitrarily chosen in order to classify the empirical distri- 
butions. 



2.2 The competitive cluster growth process 

Once we have built a SFIN with N nodes and maximum 
degree fcj\f, we apply a competitive cluster growth pro- 
cess to it, as discussed in [2T. This technique falls into 
the category of PGP or discrete multiplicative processes, 
which are known to correctly describe scale-invariant be- 
havior |22j . Por starters, we fix the values of the minimum 
cluster size rii and the total number of clusters ric that will 
grow in the network. Next, ric x nodes of the network 
are randomly selected as cluster "seeds" . In the first iter- 
ation the first neighbors of the seeds are incorporated to 
the cluster in random order, unless they are seeds of other 
clusters. At subsequent iterations, the first neighbors of 
the nodes added at the precedent step are, in turn, ran- 
domly added to the cluster, unless they already belong to 
a different cluster. The process ends when all the nodes 
belong to some cluster. We display in Fig. [1] the final re- 
sult of a competitive growth process for ric — 100 clusters 
with an initial size of = 1 node in a network of 5 000 
nodes. 
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Fig. 1. (Color online) Final result of a competitive process 
of ric = 100 clusters with an initial size of rii = 1 nodes in 
a network of A^ = 5 000 nodes. All the clusters belong to the 
same network and are connected among themselves. For clar- 
ity's sake, however, they have been plotted independently. A 
large variety of cluster sizes ensues. 



The procedure may include a probability t for a node 
changing to another cluster if any of its neighbors belongs 
to this cluster (micro-dynamics). However, M. Batty found 
that even if the rank of the cities rapidly evolves in time 
due to micro-dynamics, the city-size distribution evolves 
only slowly [28] : the system evolves quasi-statically at the 
macroscopic level. We then consider that the system ex- 
hibits an adiabatic evolution, implying that our distribu- 
tions can be well represented by stationary configurations 
{t = 0). Although not at this stage, we expect to study 
micro-dynamics in the future. 



3 Present results 



3.1 Study and classification of cluster-size distributions 



3.1.1 Regime of low-density of clusters: recovering the SFIG 



We have studied the size distribution of SFIN-clusters 
with N and km ranging from A^ = 5 000 to 500 000 and 
fcj\/ = 50 to 500 — finite-size effects may be relevant for 
smaller values of A^ and kM- When the density of seeds, 
defined as ps = ncUi/N, is much lower than unity, the 
probability density distribution of sizes p{x) mostly fol- 
lows that of the SFIG at equilibrium, which is in the con- 
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tinuous limit 



p(x)dx 



1 dx 

— — II xi < a; < xm 

il X 

otherwise 



(8) 



where f2 = ln(xjv//a;i) is the "volume" in the concomi- 
tant size-configuration space. The maximum xm and min- 
imum xi sizes generally depend on rii, ric, and N. Since 
finite-size effects make it difficult to estimate xi and xm, 
we have found it useful to evaluate the volume as H — 
2 ln(a;3/4/xi/4), where 2:1/4 and 2:3/4 indicate the first and 
third quartiles of the distribution. 

It is convenient for a scale-invariant system to introduce a 
new variable x' = x/9 without changing the physics, with 
9 a parameter to be later defined. Furthermore, we can 
rescale the volume to ]?' = Cf2 according to 



which leads to the scaled distribution 



p(x')dx = 



1 dx' 





if {:^f<x'<{^f 

otherwise 



(9) 



(10) 



Note that these changes do not affect the properties of 
the distribution, which remains that of a SFIG. It is also 
useful to employ the reduced units defined by 6* = 2:1/2 
and fl' = 2, where 2:1/2 is the median of the distribution. 
In this particular case, for the new variable y defined by 
the transformation 



y = 



= 3/4 
= 1/4 



(11) 



the density distribution takes the form 

P[y)dy = < 2 y 

otherwise 



(12) 



For convenience we define a "normalized" rank-parameter 
r in such a way that all the pertinent "sizes" to be here 
considered range within the interval [0,1]. This normalized 
rank-size distribution associated to the density distribu- 
tion gets cast as 

y = e'-'r (13) 

Note that the density distributions ((T^) and associated the 
rank-size (jl3p do no longer depend on n^, ric, A'^, or fcjv/, 
since xm and 2:1 do not enter the definition of the maxi- 
mum and minimum sizes when expressed in such units. 



on the size of the neighboring clusters. The size distribu- 
tion exhibits important deviations from the SFIG, but the 
change to reduced units makes it still possible to compare 
between distributions obtained with different values of n^, 
ric, N and kM- These comparisons have led us to find a 
classification of the distributions using a parameter A — 
which we denominate competitiveness — that we pass now 
to discuss. 

Network configuration theory tells us that for a given de- 
gree distribution, the mean number of j-th neighbors of a 
node is 13J 

= I — 1 zi, (14) 



where zi and Z2 are the mean number of first and second 
neighbors, respectively. Consequently, the mean size (a:)^ 
of the cluster generated for each seed is, at the end of the 
process, 



J/ 



3f 

E 

i=i 



zi ~ X ^zi 



(15) 



where j/ is the mean number of total iterations used in the 
process, and is a new parameter defined by (jlSp for 
future convenience. Since all nodes of the network belong, 
at the end of the process, to a certain cluster, the mean 
size times the number of seeds must be equal to the total 
number of nodes, i.e., 



Ticn. 



(16) 



For a scale- free ideal network, 
which gives for large /cm 



zi 



= (k) = ikM-l)/lnkM, 



TV In A: 



M 



Ps 



In kM 



(17) 



We interpret A as a quantifier of the strength of the in- 
teractions and use it to classify the family of distributions 
obtained via our simulations. In our simulations we have 
studied distributions with values ranging from A — > — 
where the SFIG emerges naturally — up to A ~ 10 for 
very high density and a very connected network — or very 
small- world |29j . Anyhow, we have found no evidence of an 
upper bound in A. We display in Fig.[2]the rank-size y{X, r) 
in semi-log scale for different values of competitiveness A. 
These curves have been obtained by generating several 
networks and computing a large number of competitive 
processes within them to reduce numerical fluctuations. 



3.2 Study and classification of empirical distributions 
by competitiveness 



3.1.2 Regime of high density of clusters: classification by 
competitiveness 

When we increase the number of clusters, the competi- 
tion for space grows and the size of a cluster depends now 



We have found that the change performed above to re- 
duced unit y can be applied to empirical data to com- 
pare the distributions for city sizes and electoral results 
in different countries or states. It allows to compare the 
effect of societal features, such as policies and economic 
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Fig. 2. (Color online) Scaled rank-size distribution y{X,r) in 
semi-log scale for different values of competitiveness A. These 
curves have been obtained by computing a large number of 
competitive processes to reduce statistic fluctuations. 

data. Furthermore, a comparison between these distribu- 
tions and those obtained via our simulations can be per- 
formed to assign a competitiveness value to the empirical 
data. This assignation is effected by minimizing the dis- 
tance between the data and the computed curve y(A,r), 
using a Kolmogorov-Smirnof test |30j (some examples are 
displayed in Fig. [3]-Fig. [9] for electoral results and city 
populations) . Since our simulation fits nicely the data, we 
are compelled to conclude that in general, the scaled dis- 
tributions of city populations and electoral results can be 
classified according to the values of X. 



3.2.1 City size distributions 

We have performed an exhaustive city-population study 
for the provinces of Spain [3T]. We have fitted each dis- 
tribution to y(A,r) and have found a competitiveness dis- 
tribution with a median of A1/2 = 0.65, reflecting some 
local dependence. We depict in Figs. [3] and [4] the scaled 
rank-size distributions of some provinces, together with 
the accompanying A- family of distributions, which nicely 
fit the data. We have found that the rank-size distribution 
of the capital cities has a competitiveness of 0.71 (Fig.[5f), 
which does not significantly differ from the median value. 




0.5 1 0.5 1 



r (rank) 

Fig. 3. (Color online) Classification of city-size distributions 
by competitiveness (A), a, scaled rank- plot of the city popu- 
lation of Girona province (A = 1.74) b, Bizkaia (A — 1.63) 
c, Castello (A = 0.65) d, Cuenca (A — 0.58) e, Granada 
(A = 0.06) f, capital cities of Spanish provinces (A = 0.71). 
All empirical data are plotted with red dots, and compared 
to the rank-size distributions obtained with a numerical sim- 
ulation employing the same value of competitiveness (in black 
lines). 



We contend that the fact that this distribution can be 
classified by competitiveness is a signature of the scale in- 
variant nature of the social system: the whole country can 
be thought of as a single network of (only) capital cities, 
which displays similar statistical properties as those of the 
complete network, which includes all cities. 

We have detected some singular exceptions in the fitting 
of these curves, as illustrated in Fig. [5] for the provinces 
of Guadalajara and Malaga. We understand that these 
deviations from the A-family of distributions reflect local 
effects in policies or in social, economical or geographical 
factors, as some studies have found [32]. In the case of 
Guadalajara, the uppermost cities in the plot — those that 
deviate from the best fit — are located in the neighborhood 
of Madrid, the capital of Spain. The capital of Spain thus 
affects the population distribution in its neighborhood. 

The empirical maximum degree /cm , which defines the vol- 
ume of the SFIN in configuration space, has been esti- 
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Fig. 4. (Color online) Classification of city-size distributions 
by competitiveness (A), a, scaled rank-plot of the city popula- 
tion of Tarragona province (A = 0.69) b, Caceres (A — 0.65) c, 
Burgos (A = 0.65) d, Cantabria (A = 0.59) e, Avila (A = 0.43) 
f, La Rioja (A = 0.33). All empirical data are plotted using 
red dots, and compared to the rank-size distributions obtained 
with our numerical simulation employing the same value of 
competitiveness (in black lines). 




Fig. 5. (Color online) Examples of deviations from the A- fam- 
ily of distributions, a, scaled rank-plot of the city population 
of Guadalajara province, compared with the best fit. Devia- 
tions are seen in the case of the cities at the graph's top (see 
text), b, Malaga, where the above deviations are also present. 
All empirical data are plotted using red dots, and simulation 
with black lines. 
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Fig. 6. (Color online) Plot of the competitiveness A of the 
Spanish provinces versus the empirical value of maximum de- 
gree kM- The medians of both parameters are represented by 
lines. 



mated for each province by solving the equation 



TV 



In A: 



(18) 



M 



Here N is the total population, fic the number of cities 
and fii the population of the smallest city, which is used 
to estimate the minimum cluster size. We have found that 
the empirical distribution of the maximum degree exhibits 
a large tail, whose median is 166. The first and third quar- 
tiles are 37 and 319 respectively, hence 



(19) 



We exhibit in Fig. [H] the competitiveness versus the em- 
pirical value of maximum degree of the Spanish provinces. 
The medians of both parameters are also shown. The max- 
imum number of connections is an observable that has 
been evaluated before in the literature and is known as 
Dunbar's number |33j . It can be found in the fields of 
anthropology, evolutionary psychology, and sociology. It 
refiects the fact that the maximum number of individu- 
als with whom any person can maintain stable social re- 
lationships is determined by the size of their neocortex 
|34j . Dunbar's number lies between 100 and 230, but a 
commonly detected value is 150, which fits quite well our 
results. As far as we know, the present work is the first in 
which Dunbar's number is computed using a mathematical 
model based on first principles. We have checked with the 
case of the province of Teruel that a SFIN with a max- 
imum degree of 150 is able to reproduce the associated 
empirical distribution without the need of scaling it via 
the change to variable y. A total population of 109 810 
inhabitants, excluding the capital city -we have already 
seen that the capital belongs to a larger national network- 
, distributed into 235 cities, is modelled by a network of i) 
N = 100 000 nodes, ii) a maximum degree of km = 150, 
and iii) by "growing" Uc = 250 clusters with an initial 
size of = 1 node. We depict the rank size distribution 
in Fig. [7^, where it can be clearly seen that the simula- 
tion nicely fits the data. The city-size distribution is also 
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Fig. 7. (Color online) Top panel, rank-size distribution of 
Teruel, Spain (red dots) is compared to the result of competi- 
tive processes carried out in networks with A'^ — 100 000 nodes 
and where Dunbar's number is the maximum degree, fcjv/ ~ 150 
(black line) . The size distributions are not scaled, which implies 
a relation one to one between inhabitants and nodes. We also 
show the result of the process in a BA network, which is not 
able to reproduce the empirical distribution (see text). Bot- 
tom panel, average path length — computed (blue dots) and 
extrapolated (black line) — for networks with maximum de- 
gree kM ~ 150. The extrapolation (see text) reproduces recent 
empirical measures. 



compared with the cluster growth process obtained in a 
Barabasi-Albert network (BA) with the same number of 
nodes and clusters. In this figure we see that not all kinds 
of networks will be able to reproduce the empirical distri- 
bution, even if we employ a similar number of nodes and 
clusters, as in the case of the BA network. 

The average path length (APL) of a network is defined, 
for all possible pairs of nodes, as the average number of 
steps along the shortest path. It is one of the most impor- 
tant quantities characterizing a network's topology [13 . 
We have numerically computed the APL of a SPIN with 
fcjvf = 150 as a function of up to = 100 000 nodes. 
One easily sees the expected dependence on logA^, as il- 
lustrated by Fig. [7)d. The extrapolation gives APL — 4.00 
for a SPIN of 100 000 nodes, APL = 5.63 for 45 000 000 
nodes (population of Spain), APL = 6.13 for 300 000 000 
nodes (population of the USA [3S]), and APL = 6.95 for 
6 500 000 000 nodes (World population) . These values are 
in accordance with the empirical measure of Travers and 
Milgram, known as the "six degrees" 01], and with the 
more recent results of P.S. Dodds et ai, who found an APL 




0.5 1 0.5 1 



r (rank) 

Fig. 8. (Color online) Classification of electoral results distri- 
butions by competitiveness (A), a, scaled rank- plot of the 2005 
elections results in UK (A = 6.5). b, 2004 elections results in 
the USA (A = 4.6). c, 2008 elections results in Italy (A = 2.7). 
d, 2008 elections results in Spain (A = 0.98). All empirical data 
are plotted using red dots, and are compared to the rank-size 
distributions obtained via a simulation employing the same 
value of competitiveness (in black lines). 

between 5 and 7 [37], or J. Leskovec and E. Horvitz, who 
found 6.6 degrees between Messenger users [25 . These re- 
sults indicate that the "six degrees of separation" is a di- 
rect consequence of Dunbar's number. 

3.2.2 Electoral results 

We have carried out a similar competitiveness study for 
the results of General Elections in different countries and 
computed the A value in the cases of UK'05 [38 , USA'04 
[39], Italy'08 [40], and Spain'08 41J, finding A = 6.5, 4.6, 
2.7, and 0.98, respectively (Fig. |S|), all values being larger 
than the average found for city populations. In general, a 
high value of the competitiveness increases the difference 
in the number of votes between two consecutive parties in 
the rank of results. 

The estimates of the maximum degree are km 600 000, 
3 000 000, 19 000, and 35 000, respectively, i.e., many or- 
ders of magnitude larger than Dunbar's number. Thus, 
the volume in configuration space of the SPIN that de- 
scribes the election process is larger than that for the city 
population. This is the effect of the creation and develop- 
ment of temporary connections. A politician, journalist, 
or blog writer can be easily connected during the elec- 
toral campaign to thousands of people via mass media, 
such as television, newspapers, or the Internet. In accor- 
dance with Dodds' results, the world becomes smaller — 
more connected — when individual incentives exist |37| , in 
this case to obtain good electoral results. These findings 
lead to interesting conclusions. In the USA's case we find 



8 



A. Hernando et al.: Unravelling the size distribution of social groups 



larger hubs than in the UK: 3 000 000 connections against 
600 000, but since the total population is Np = 300 000 000 
against 61 000 000 [42 , the relative value is similar for each 
country, Um /Np ^ 0.01. This value indicates that the USA 
and the UK have similar social networks in electoral cam- 
paigns, but scaled. Since there are more parties competing 
in elections in UK's case, the distribution of the results 
naturally displays a higher competitiveness than in the 
USA's one. 



3.2.3 The universal distribution 

Studying the city population of different countries around 
the world [43] we have found that, for countries with a 
population over 5 000 000, the main portion of the scaled 
distributions turns out to be quite similar, in fact the same 
distribution, thus evidencing some degree of universality, 
as illustrated in Fig. [9] for USA and Germany. Even the 
distribution of the size of companies in these countries 
follows this behavior, as depicted in the same figure for 
USA firms [35J. This universal distribution can be repro- 
duced by our simulation. Note that the competitiveness 
has a local dependence, and thus data of a country are 
in fact several sets of data (for many states or provinces 
of that country), which have different values of the com- 
petitiveness. We have simulated this universal distribution 
by mixing data generated with different values of compet- 
itiveness, between 0.4 and 1, obtaining the curve ya{r), 
which nicely fits the empirical distributions as can be seen 
in Fig. m 
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Fig. 9. (Color online) Top panel, scaled city-size distribution 
for the USA (green squares) and Germany (blue triangles), and 
scaled firm-size distribution of the USA (red circles) compared 
to the universal distribution generated with our numerical sim- 
ulation (black line) by mixing a large amount of data with dif- 
ferent values of competitiveness. Bottom panel, same as top 
panel for the scaled rank-size distribution in log-log scale. 



4 Summary and discussion 

We have shown in this communication that the main prop- 
erties of the city-size distributions and electoral results can 
be well reproduced when interactions between network el- 
ements are introduced by means of a competitive cluster 
growth process in a SFIN. We classify the deviations from 
the SFIN distribution in terms of just a single parame- 
ter, the competitiveness A, that quantifies the strength of 
the interaction between the elements of the system. As 
expected, the SFIG-distribution emerges naturally in the 
limit of low competitiveness. The value of A can be easily 
extracted from empirical data by using the transformation 
to reduced units given in Eq. (jlip and then comparing the 
scaled distribution to a distribution of known competitive- 
ness. 

In our simulations this parameter is related to the to- 
tal density of clusters and to the maximum degree of the 
network — or the volume in the configuration space — by 
Eq. ((TT)) . For real systems, our results in the study of 
the Spanish provinces indicate that this relation remains 
valid. We have used it to compute the empirical average 
of the maximum degree, finding that it reproduces Dun- 
bar's number j34j . Furthermore, the rank-size distribution 
of Teruel is reproduced using real values for the density 
of cities together with a maximum degree in a SFIN. Our 



simulations also predict the empirical estimate of the av- 
erage path-length when we use Dunbar's number for the 
maximum degree of the SFIN. This indicates that the 
known "six degrees of separation" [36j is a consequence 
of Dunbar's number. For electoral results, we have found 
that the maximum degree grows by an order of magnitude 
— the volume in the configuration space grows — , which 
confirms the statement that the world is more connected 
when individual incentives do play a role. [37] 

Some studies have found correlations between city-size dis- 
tribution and regional policies f32] . We believe that the use 
of the A parameter for such studies would add a very useful 
tool in order to classify the ensuing distributions. What 
could represent an advance in social and political sciences, 
would be to systematically assess the dependence of the 
competitiveness on local policies. As seen in the case of 
electoral results, a high value of the competitiveness en- 
hances the difference (in number of votes) between two 
consecutive parties in the results rank. This implies that 
a small party would prefer a scenario with a low value 
of A in order to get better chances in the final tallies, 
whereas a big party would choose a high value in order to 
increase the relative difference with the other parties. For 
city sizes, a low value of the competitiveness works against 
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18. 



supersaturated cities, whereas a high value promotes the 14 
importance of a capital city. 

In general, all empirical distributions agree quite well with 15 
those obtained with our simulation, but we found also 
some singular exceptions. We expect these to be related to 16 
the already mentioned regional policies, and to historical 
or geographical factors. Thus, our model could help to ^''^ 
identify such scenarios. Exhaustive studies of data around 
the world are necessary to build a bridge between the three 
variables of Eq. p4)) . ps, k„i and A, and the social and 
economic polices of a region. It is also reasonable to think 
that a study in competitiveness terms of the evolution of 
firms-size distributions during the last years may lead to 20 
a deeper understanding of the present economic situation. 
Summing up, our results show that scale invariant ther- 
modynamics yields a useful framework for dealing with 2I. 
scale invariant phenomena. Its application to social sci- 
ences here has provided some deeper insight into the way 
humans build up a society. This work only represents a 22. 
first step, and it is expected that subsequent studies will 
enhance the predictive power of the theory. 23. 

24. 
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and to Albert Diaz, Carles Panades, Joan Manel Hernandez, 25 
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